1,825 research outputs found
Feature Selection via Coalitional Game Theory
We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets
"We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects
A tenet of open source software development is to accept contributions from
users-developers (typically after appropriate vetting). But should this also
include interventions done as part of research on open source development?
Following an incident in which buggy code was submitted to the Linux kernel to
see whether it would be caught, we conduct a survey among open source
developers and empirical software engineering researchers to see what behaviors
they think are acceptable. This covers two main issues: the use of publicly
accessible information, and conducting active experimentation. The survey had
224 respondents. The results indicate that open-source developers are largely
open to research, provided it is done transparently. In other words, many would
agree to experiments on open-source projects if the subjects were notified and
provided informed consent, and in special cases also if only the project
leaders agree. While researchers generally hold similar opinions, they
sometimes fail to appreciate certain nuances that are important to developers.
Examples include observing license restrictions on publishing open-source code
and safeguarding the code. Conversely, researchers seem to be more concerned
than developers about privacy issues. Based on these results, it is recommended
that open source repositories and projects address use for research in their
access guidelines, and that researchers take care to ask permission also when
not formally required to do so. We note too that the open source community
wants to be heard, so professional societies and IRBs should consult with them
when formulating ethics codes.Comment: 15 pages with 42 charts and 3 tables; accepted versio
Shedding light on a living lab: the CLEF NEWSREEL open recommendation platform
In the CLEF NEWSREEL lab, participants are invited to evaluate news recommendation techniques in real-time by providing news recommendations to actual users that visit commercial news portals to satisfy their information needs. A central role within this lab is the communication between participants and the users. This is enabled by The Open Recommendation Platform (ORP), a web-based platform which distributes users' impressions of news articles to the participants and returns their recommendations to the readers. In this demo, we illustrate the platform and show how requests are handled to provide relevant news articles in real-time
Random Filters for Compressive Sampling and Reconstruction
We propose and study a new technique for efficiently acquiring and reconstructing signals based on convolution with a fixed FIR filter having random taps. The method is designed for sparse and compressible signals, i.e., ones that are well approximated by a short linear combination of vectors from an orthonormal basis. Signal reconstruction involves a non-linear Orthogonal Matching Pursuit algorithm that we implement efficiently by exploiting the nonadaptive, time-invariant structure of the measurement process. While simpler and more efficient than other random acquisition techniques like Compressed Sensing, random filtering is sufficiently generic to summarize many types of compressible signals and generalizes to streaming and continuous-time signals. Extensive numerical experiments demonstrate its efficacy for acquiring and reconstructing signals sparse in the time, frequency, and wavelet domains, as well as piecewise smooth signals and Poisson processes
Brown representability for space-valued functors
In this paper we prove two theorems which resemble the classical
cohomological and homological Brown representability theorems. The main
difference is that our results classify small contravariant functors from
spaces to spaces up to weak equivalence of functors.
In more detail, we show that every small contravariant functor from spaces to
spaces which takes coproducts to products up to homotopy and takes homotopy
pushouts to homotopy pullbacks is naturally weekly equivalent to a
representable functor.
The second representability theorem states: every contravariant continuous
functor from the category of finite simplicial sets to simplicial sets taking
homotopy pushouts to homotopy pullbacks is equivalent to the restriction of a
representable functor. This theorem may be considered as a contravariant analog
of Goodwillie's classification of linear functors.Comment: 19 pages, final version, accepted by the Israel Journal of
Mathematic
When Are Names Similar Or the Same? Introducing the Code Names Matcher Library
Program code contains functions, variables, and data structures that are
represented by names. To promote human understanding, these names should
describe the role and use of the code elements they represent. But the names
given by developers show high variability, reflecting the tastes of each
developer, with different words used for the same meaning or the same words
used for different meanings. This makes comparing names hard. A precise
comparison should be based on matching identical words, but also take into
account possible variations on the words (including spelling and typing
errors), reordering of the words, matching between synonyms, and so on. To
facilitate this we developed a library of comparison functions specifically
targeted to comparing names in code. The different functions calculate the
similarity between names in different ways, so a researcher can choose the one
appropriate for his specific needs. All of them share an attempt to reflect
human perceptions of similarity, at the possible expense of lexical matching.Comment: 20 pages. Download from https://pypi.org/project/namecompare
Comparing the Efficacy of Drug Regimens for Pulmonary Tuberculosis: Meta-analysis of Endpoints in Early-Phase Clinical Trials
Background A systematic review of early clinical outcomes in tuberculosis was undertaken to determine ranking of efficacy of drugs and combinations, define variability of these measures on different endpoints, and to establish the relationships between them. Methods Studies were identified by searching PubMed, Medline, Embase, LILACS (Latin American and Caribbean Health Sciences Literature), and reference lists of included studies. Outcomes were early bactericidal activity results over 2, 7, and 14 days, and the proportion of patients with negative culture at 8 weeks. Results One hundred thirty-three trials reporting phase 2A (early bactericidal activity) and phase 2B (culture conversion at 2 months) outcomes were identified. Only 9 drug combinations were assessed on >1 phase 2A endpoint and only 3 were assessed in both phase 2A and 2B trials. Conclusions The existing evidence base supporting phase 2 methodology in tuberculosis is highly incomplete. In future, a broader range of drugs and combinations should be more consistently studied across a greater range of phase 2 endpoints
- …